Skip to content

Conversation

@Sujanadh
Copy link
Contributor

Description

  • Added a stream parameter in the RawDataClientConfig in order to avoid unnecessarily downloading data extracts on disc.

@Sujanadh Sujanadh requested a review from spwoodcock June 20, 2025 12:31
@Sujanadh Sujanadh self-assigned this Jun 20, 2025
@github-actions github-actions bot added the bug Something isn't working label Jun 20, 2025
@Sujanadh Sujanadh force-pushed the fix/allow-stream-data-url branch from de435d6 to a0547a7 Compare June 20, 2025 12:36
@spwoodcock spwoodcock requested a review from emirfabio June 20, 2025 15:09

async def get_osm_data(geometry: Union[Dict[str, Any], str], **kwargs) -> RawDataResult:
async def get_osm_data(
geometry: Union[Dict[str, Any], str], stream: Optional[bool] = None, **kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think adding the stream param is not necessary, as this is captured by **kwargs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get it from **kwargs, we can do something like this:

    config = RawDataClientConfig.default()

    if (stream := kwargs.pop("stream", False)):
        config.stream = stream

    client = RawDataClient(config=config)
    return await client.get_osm_data(geometry, **kwargs)

@spwoodcock
Copy link
Member

I also replaced typing.[List,Dict,Union,Tuple] with the equivalent typing syntax built into Python now

@spwoodcock spwoodcock requested review from spwoodcock and removed request for spwoodcock June 20, 2025 16:57
@spwoodcock
Copy link
Member

I refactored this to use RawDataOutputOptions, hope you don't mind @Sujanadh!

@emirfabio would love if you could review this to see if it makes sense / if you prefer another approach.
There is currently also one test failing - I didn't get to dig into it

@spwoodcock
Copy link
Member

@Sujanadh I'm not sure if Emir is around to check this over, but what do you think?

Is it working for you?

@Sujanadh
Copy link
Contributor Author

looks good to me.
But response seems like this:

{
  "metadata": {
    "task_id": "9aa3a4e0-7252-4ba6-89f9-8c730c2c2afd",
    "format_ext": "geojson",
    "timestamp": "",
    "size_bytes": 63219,
    "file_name": "fmtm_data_extract",
    "download_url": "https://s3.dualstack.us-east-1.amazonaws.com/production-raw-data-api/default/fmtm_data_extract_geojson_uid_9aa3a4e0-7252-4ba6-89f9-8c730c2c2afd.geojson",
    "is_zipped": false,
    "bbox": null
  },
  "path": null,
  "data": {
    "download_url": "https://s3.dualstack.us-east-1.amazonaws.com/production-raw-data-api/default/fmtm_data_extract_geojson_uid_9aa3a4e0-7252-4ba6-89f9-8c730c2c2afd.geojson",
    "file_name": "fmtm_data_extract",
    "process_time": "a second",
    "query_area": "0.04 Sq Km",
    "binded_file_size": "0.06 MB",
    "zip_file_size_bytes": 63219
  },
  "extracted": false,
  "original_path": null,
  "extracted_files": null
}

There are repeated keys and values in both metadata and data . We can keep metadata such as process time, filename, query area and so on in the metadata section itself. Need to refactor it.

@spwoodcock
Copy link
Member

looks good to me.
But response seems like this:

{
  "metadata": {
    "task_id": "9aa3a4e0-7252-4ba6-89f9-8c730c2c2afd",
    "format_ext": "geojson",
    "timestamp": "",
    "size_bytes": 63219,
    "file_name": "fmtm_data_extract",
    "download_url": "https://s3.dualstack.us-east-1.amazonaws.com/production-raw-data-api/default/fmtm_data_extract_geojson_uid_9aa3a4e0-7252-4ba6-89f9-8c730c2c2afd.geojson",
    "is_zipped": false,
    "bbox": null
  },
  "path": null,
  "data": {
    "download_url": "https://s3.dualstack.us-east-1.amazonaws.com/production-raw-data-api/default/fmtm_data_extract_geojson_uid_9aa3a4e0-7252-4ba6-89f9-8c730c2c2afd.geojson",
    "file_name": "fmtm_data_extract",
    "process_time": "a second",
    "query_area": "0.04 Sq Km",
    "binded_file_size": "0.06 MB",
    "zip_file_size_bytes": 63219
  },
  "extracted": false,
  "original_path": null,
  "extracted_files": null
}

There are repeated keys and values in both metadata and data . We can keep metadata such as process time, filename, query area and so on in the metadata section itself. Need to refactor it

Agree, there is a bit of duplication!

For a future PR 😁

@spwoodcock spwoodcock merged commit 7a6643f into main Jun 25, 2025
2 checks passed
@spwoodcock spwoodcock deleted the fix/allow-stream-data-url branch June 25, 2025 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dependency docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants